多模式情感分析由于其在多模式相互作用中的信息互补性而具有广泛的应用。以前的作品更多地着重于研究有效的联合表示,但他们很少考虑非峰值提取和多模层融合的数据冗余性的不足。在本文中,提出了一个基于视频的跨模式辅助网络(VCAN),该网络由音频特征映射模块和跨模式选择模块组成。第一个模块旨在大大提高音频功能提取的特征多样性,旨在通过提供更全面的声学表示来提高分类精度。为了授权该模型处理冗余视觉功能,第二个模块是在集成视听数据时有效地过滤冗余视觉框架的。此外,引入了由几个图像分类网络组成的分类器组,以预测情感极性和情感类别。关于RAVDESS,CMU-MOSI和CMU-MOSEI基准的广泛实验结果表明,VCAN明显优于提高多模式情感分析的分类准确性的最新方法。
translated by 谷歌翻译
大量人群遭受全世界认知障碍。认知障碍的早期发现对患者和护理人员来说都非常重要。然而,现有方法具有短缺,例如诊所和神经影像阶段参与的时间消耗和财务费用。已经发现认知障碍的患者显示出异常的情绪模式。在本文中,我们展示了一种新的深度卷积网络的系统,通过分析面部情绪的演变来检测认知障碍,而参与者正在观看设计的视频刺激。在我们所提出的系统中,使用来自MobileNet的层和支持向量机(SVM)的图层开发了一种新的面部表情识别算法,这在3个数据集中显示了令人满意的性能。为了验证拟议的检测认知障碍系统,已经邀请了61名老年人,包括认知障碍和健康人作为对照组的患者参加实验,并相应地建立了一个数据集。使用此数据集,所提出的系统已成功实现73.3%的检测精度。
translated by 谷歌翻译
示例性类增量学习需要分类模型来逐步学习新的类知识,而无需保留任何旧样本。最近,基于并行单级分类器(POC)的框架,它为每个类别独立地列举单级分类器(OCC),引起了广泛的关注,因为它可以自然避免灾难性的遗忘。然而,由于其不同OOC的独立培训策略,POC遭受了弱歧视性和可比性。为满足这一挑战,我们提出了一个新的框架,命名为判别和可比单级分类器,用于增量学习(Discoil)。 Discoil遵循POC的基本原理,但它采用变分自动编码器(VAE)而不是其他良好的一流的单级分类器(例如,深度SVDD),因为训练VAE不仅可以识别属于输入样本的概率一个班级,但也会生成课程的伪样本,以协助学习新任务。通过这种优势,与旧级别的VAE相比,Discoil列举了一个新的VAE,这迫使新级VAE为新级样本重建,但对于旧级伪样本更糟糕,从而提高了可比性。此外,Discoil引入了铰链重建损失以确保辨别性。我们在MNIST,CIFAR10和TINY-ImageNet中广泛评估我们的方法。实验结果表明,Discoil实现了最先进的性能。
translated by 谷歌翻译
FDG-PET揭示了具有轻度认知障碍(MCI)和Alzheimer疾病(AD)的个体的脑代谢改变。通过计算机辅助诊断(CAD)技术源自FDG-PET的一些生物标志物已被证明可以准确诊断正常控制(NC),MCI和AD。然而,使用FDG-PET图像鉴定早期MCI(EMCI)和晚期MCI(LMCI)的研究仍然不足。与基于FMRI和DTI图像的研究相比,FDG-PET图像中区域间表示特征的研究不足。此外,考虑到不同个体的可变性,一些与两个类非常相似的硬样品限制了分类性能。为了解决这些问题,本文提出了一种新的双线性池和度量学习网络(BMNet),其可以通过构造嵌入空间来提取区域间表示特征并区分硬样品。为了验证所提出的方法,我们从ADNI收集998个FDG-PET图像。在常见的预处理步骤之后,根据自动解剖地标(AAL)模板从每个FDG-PET图像中提取90个特征,然后被发送到所提出的网络。对多种两类分类进行了广泛的5倍交叉验证实验。实验表明,在向基线模型中添加双线性池模块和度量损耗后,大多数度量都会得到改善。具体而言,在EMCI和LMCI之间的分类任务中,在添加三维度量损失后,特异性提高了6.38%,并且使用双线性池模块后,负预测值(NPV)在3.45%后提高了3.45%。
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译
Text clustering and topic extraction are two important tasks in text mining. Usually, these two tasks are performed separately. For topic extraction to facilitate clustering, we can first project texts into a topic space and then perform a clustering algorithm to obtain clusters. To promote topic extraction by clustering, we can first obtain clusters with a clustering algorithm and then extract cluster-specific topics. However, this naive strategy ignores the fact that text clustering and topic extraction are strongly correlated and follow a chicken-and-egg relationship. Performing them separately fails to make them mutually benefit each other to achieve the best overall performance. In this paper, we propose an unsupervised text clustering and topic extraction framework (ClusTop) which integrates text clustering and topic extraction into a unified framework and can achieve high-quality clustering result and extract topics from each cluster simultaneously. Our framework includes four components: enhanced language model training, dimensionality reduction, clustering and topic extraction, where the enhanced language model can be viewed as a bridge between clustering and topic extraction. On one hand, it provides text embeddings with a strong cluster structure which facilitates effective text clustering; on the other hand, it pays high attention on the topic related words for topic extraction because of its self-attention architecture. Moreover, the training of enhanced language model is unsupervised. Experiments on two datasets demonstrate the effectiveness of our framework and provide benchmarks for different model combinations in this framework.
translated by 谷歌翻译
An increasing number of public datasets have shown a marked clinical impact on assessing anatomical structures. However, each of the datasets is small, partially labeled, and rarely investigates severe tumor subjects. Moreover, current models are limited to segmenting specific organs/tumors, which can not be extended to novel domains and classes. To tackle these limitations, we introduce embedding learned from Contrastive Language-Image Pre-training (CLIP) to segmentation models, dubbed the CLIP-Driven Universal Model. The Universal Model can better segment 25 organs and 6 types of tumors by exploiting the semantic relationship between abdominal structures. The model is developed from an assembly of 14 datasets with 3,410 CT scans and evaluated on 6,162 external CT scans from 3 datasets. We rank first on the public leaderboard of the Medical Segmentation Decathlon (MSD) and achieve the state-of-the-art results on Beyond The Cranial Vault (BTCV). Compared with dataset-specific models, the Universal Model is computationally more efficient (6x faster), generalizes better to CT scans from varying sites, and shows stronger transfer learning performance on novel tasks. The design of CLIP embedding enables the Universal Model to be easily extended to new classes without catastrophically forgetting the previously learned classes.
translated by 谷歌翻译
Recent advances in self-supervised learning (SSL) in computer vision are primarily comparative, whose goal is to preserve invariant and discriminative semantics in latent representations by comparing siamese image views. However, the preserved high-level semantics do not contain enough local information, which is vital in medical image analysis (e.g., image-based diagnosis and tumor segmentation). To mitigate the locality problem of comparative SSL, we propose to incorporate the task of pixel restoration for explicitly encoding more pixel-level information into high-level semantics. We also address the preservation of scale information, a powerful tool in aiding image understanding but has not drawn much attention in SSL. The resulting framework can be formulated as a multi-task optimization problem on the feature pyramid. Specifically, we conduct multi-scale pixel restoration and siamese feature comparison in the pyramid. In addition, we propose non-skip U-Net to build the feature pyramid and develop sub-crop to replace multi-crop in 3D medical imaging. The proposed unified SSL framework (PCRLv2) surpasses its self-supervised counterparts on various tasks, including brain tumor segmentation (BraTS 2018), chest pathology identification (ChestX-ray, CheXpert), pulmonary nodule detection (LUNA), and abdominal organ segmentation (LiTS), sometimes outperforming them by large margins with limited annotations.
translated by 谷歌翻译
Due to their ability to offer more comprehensive information than data from a single view, multi-view (multi-source, multi-modal, multi-perspective, etc.) data are being used more frequently in remote sensing tasks. However, as the number of views grows, the issue of data quality becomes more apparent, limiting the potential benefits of multi-view data. Although recent deep neural network (DNN) based models can learn the weight of data adaptively, a lack of research on explicitly quantifying the data quality of each view when fusing them renders these models inexplicable, performing unsatisfactorily and inflexible in downstream remote sensing tasks. To fill this gap, in this paper, evidential deep learning is introduced to the task of aerial-ground dual-view remote sensing scene classification to model the credibility of each view. Specifically, the theory of evidence is used to calculate an uncertainty value which describes the decision-making risk of each view. Based on this uncertainty, a novel decision-level fusion strategy is proposed to ensure that the view with lower risk obtains more weight, making the classification more credible. On two well-known, publicly available datasets of aerial-ground dual-view remote sensing images, the proposed approach achieves state-of-the-art results, demonstrating its effectiveness. The code and datasets of this article are available at the following address: https://github.com/gaopiaoliang/Evidential.
translated by 谷歌翻译